Characterizations of Recursively Enumerable Languages by Means of Insertion Grammars

نویسندگان

  • Carlos Martín-Vide
  • Gheorghe Paun
  • Arto Salomaa
چکیده

An insertion grammar is based on pure rules of the form uu + lc~v (the string x is inserted in the context (u,u)). A strict subfamily of the context-sensitive family is obtained, incomparable with the family of linear languages. We prove here that each recursively enumerable language can be written as the weak coding of the image by an inverse morphism of a language generated by an insertion grammar (with the maximal length of stings u, u as above equal to seven). This result is rather surprising in view of some closure properties established earlier in the literature. Some consequences of this result are also stated. When also erasing rules of the form uxu + uu are present (the string x is erased from the context (u,u)), then a much easier representation of recursively enumerable languages is obtained, as the intersection with V* of a language generated by an insertion grammar with erased strings (having the maximal length of strings u, u as above equal to two). @ 1998-Elsevier Science B.V. All rights reserved 1. Insertion grammars Most of the generative mechanisms investigated in formal language theory (Thue systems, Post systems, Chomsky grammars, pure grammars, Lindenmayer systems, etc.) are based on the operation of rewriting; see, e.g., [13, 141. However, there are several classes of grammars whose basic ingredient is the adjoining operation. The most important of them are the tree adjoining grammars (TAG) [5], the contextual grammars * Corresponding author. E-mail: [email protected]. * Research supported by the Spanish Secretaria de Estado de Universidades e Investigation, SAB95-0357, and the Academy of Finland, Project 11281. 0304-3975/98/$19.00 @ 1998-Elsevier Science B.V. All rights reserved PII so304-3975(97)00079-O 196 C. Martin-vide et al. I Theoretical Computer Science 205 (1998) 195-205 [7], and the insertion grammars [4], all three introduced as models of constructions in natural languages. The insertion grammars (in [4] they are called semi-contextual grammars) are somewhat intermediate between Chomsky context-sensitive grammars (where the nonterminals are rewritten according to specified contexts) and Marcus contextual grammars (where contexts are adjoined to specified strings associated with contexts). In insertion grammars strings are adjoined depending on contexts: one gives triples of the form (u,x,u), defining a substitution of uu by uxv (the adjoining of x in the context (u,u)). Thus, insertion grammars can be also seen as pure grammars whose rules are of the form uu 4 uxu (that is, length-increasing pure grammars [8] of a particular form). Formally, an insertion grammar is a triple G = (V, S, P), where V is an alphabet, S is a finite language over I’, and P is a finite set of triples of the form (u,x, a), with u,x,v E v*. (As usual, we denote by V* the free monoid generated by the alphabet V under the operation of concatenation; the empty string is denoted by 2. We also denote by FIN, REG, LIN, CF, CS, RE the families of finite, regular, linear, context-free, contextsensitive, recursively enumerable languages, respectively. For other elementary notions of formal language theory, we refer to [14,13].) With respect to an insertion grammar G = (V, S, P) we define the relation + on V* by w *z iff w=wtuvw2, z=w1uxvw2 for (u,x,u)EP,wl,w2E V* Then, the language generated by G is defined by L(G)={zEV*IW+*Z, WES}. Clearly, the insertion rules of the form (u, 1, a) are of no use, hence in what follows we shall assume that no such a rule appears in our grammars. For an insertion grammar G = (V, S, P) we denote weight(G) = max{ Iu] I( u,x,v)EP, or (u,x,u)EP}. The family of languages L(G) generated by insertion grammars of weight at most n, n 20, is denoted by INS,; the union of all these families is denoted by INS,. Proofs of the following basic results about families of insertion languages can be found in [4,9, 10, 151. 1. FZNcINSOcINSlc~~~cINS,cCS. 2. REG is incomparable with all families INS,,, n > 1, and REG c INS,. 3. INS, c CF, but CF is incomparable with all INS,,, n 22, and INS,; INS2 contains non-semilinear languages. 4. LZN is incomparable with all INS,, n > 0, and ZN&. C. Martin-vide et al. I Theoretical Computer Science 205 (1998) 195-205 197 5. All families IIN&, n 30, are anti-AFLs (that is, they are closed under none of the following operations: union, concatenation, Kleene *, direct and inverse morphisms, intersection with regular languages). 6. Each regular language is the morphic image of a language in INSi. In view of these poor closure properties (a feature specific to all rewriting systems not using nonterminal symbols), it is of interest to look for the smallest AFL (or related structure) containing a given family INS,,, IZ > 0. As we shall see in the following section, the result is unexpected: the closure of INS7 under direct and inverse morphisms is equal to the family of recursively enumerable languages. Contrast this with the fact that all families INS, are incomparable with UN. Taking into account that an insertion grammar just adds symbols to the currently generated string, hence the capability to change the string looks quite restricted, our characterization of RE is rather surprising. Our result bears some similarity to the characterizations of RE by contextual languages in [3,2], but note that in [3] pairs of strings are adjoined, hence we can easily mark substrings u of the current string where type-0 Chomsky rules u + u are simulated, whereas in [2] one uses infinitely many rules, under the form of context-free selectors associated with contexts. These differences between insertion grammars and the contextual grammars used in [3,2] make new proof techniques necessary, leading to more complex constructions in the case of insertion grammars. 2. A characterization of RE Theorem 1. For each language L E RE there are a morphism h, a weak coding g, and a language L’ E INS7 such that L = g(h-‘(L’)). Proof. Consider a language L C T*, L E RE, generated by a type-0 Chomsky grammar G = (N, T, S, P) in Kuroda normal form, that is with the rules in P of the following types: 1. A--+BC, A-+a, A--A, for A,B,CEN,aET, 2. AB+CD, for A,B,C,DEN. From the form of the rules, we may assume that each string in L(G) is generated by a derivation consisting of two phases, one when only nonterminal rules are used and one when only terminal rules are used; moreover, we may assume that during the second phase the derivation is performed in the leftmost mode. Consider the new symbols #, $, c and construct the insertion grammar G’ = (N u T U {#, $, c}, {c4Sc6}, P’), with P’ containing the following insertion rules. (1) For each context-free rule r : A +x E P we consider the rules (1~) (ol,62~(3~lqA,#$~,~lga6tl?~lg~lg~110), for QEN U {#,$,c}, l<i<lO, a+~$! N(s), @2a3a4 4 N{$)N alaZa3a4 $?N{$}NN, a5 $ {#, $}, and if @5a6a7 crgqE N{#$}N{#}, then alo EN U {c}. 198 C. Martin-vide et al. I Theoretical Computer Science 20.5 (1998) 195-205 (2) For each non-context-free rule r : AB + CD E P we consider the rules (2.r.l) (ala2asA,$CD,Ba4), for ai ENU{#,$,c}, 1 <i<4, and ~lla2a3 $N{$}N, 612613 6 N(s), a4 $ {#v$), (2x.2) (A$CDB, #$, a), for a EN U {c}, (2x.3) (A, #, $CDB#$). (3) For each A, B EN we consider the rules (3AB.l) (alaza3AB#$,A#,adasae), for aiEN U {#,$,c}, l<i<6, ala2a3 4 N{$}N, and if a4a5 = A#, then a6 = $. (3AB.2) (A,#$,B#$A#a), for a EN U {c}. (3.AB.3) ($B#$A#, $A, a), for a EN U {c}. We say that all rules (1.r) are of type 1, all rules (2.r.i), for r a non-context-free rule in P and 1 <i<3, are of type 2, and that all rules (~.AB.~),A,BEN and 1 <i<3, are of type 3. Denote by U the set of strings a#$, for a E NUT. For each string w E U we consider a symbol b,. Let W be the set of these symbols. We define the morphism h:(WuTu{c})* +(NUTU{#,$,c})* h(b,) = w, w E U, h(a)=a, aE T, h(c) = c. Consider also the weak coding g : (W u T u {c})* + T*,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Representations and Characterizations of Languages in Chomsky Hierarchy by Means of Insertion-Deletion Systems

Insertion-deletion operations are much investigated in linguistics and in DNA computing and several characterizations of Turing computability were obtained in this framework. In this note we contribute to this research direction with a new characterization of this type, as well as with representations of regular and context-free languages, mainly starting from context-free insertion systems of ...

متن کامل

Simple-Semi-Conditional Versions of Matrix Grammars with a Reduced Regulating Mechanism

This paper discusses some conditional versions of matrix grammars. It establishes several characterizations of the family of the recursively enumerable languages based on these grammars. In fact, making use of the Geffert Normal forms, the present paper demonstrates these characterizations based on matrix grammars with conditions of a limited length, a reduced number of nonterminals, and a redu...

متن کامل

Grammar Systems as Language Analyzers and Recursively Enumerable Languages

We consider parallel communicating grammar systems which consist of several grammars and perform derivation steps, where each of the grammars works in a parallel and synchronized manner on its own sentential form, and communication steps, where a transfer of sentential forms is done. We discuss accepting and analyzing versions of such grammar systems with context-free productions and present ch...

متن کامل

On the weight of universal insertion grammars

We study the computational power of pure insertion grammars. We show that pure insertion grammars of weight 3 can characterize all recursively enumerable languages. This is achieved by either applying an inverse morphism and a weak coding, or a left (right) quotient with a regular language. A consequences for the closure properties of insertion grammars are shown. We also study an application i...

متن کامل

At the Crossroads of Linguistics, DNA Computing, and Formal Language Theory: Characterizing RE Using Insertion-Deletion Systems

Several characterizations of recursively enumerable (RE) languages are presented, using insertion-deletion systems. Such a system generates the elements of a language by inserting and deleting words, according to their contexts (the insertiondeletion rules are triples (u, z, v), with the meaning that z can be inserted or deleted in/from the context (u, v)). Grammars based on insertion rules wer...

متن کامل

The Generative Power of Categorial Grammars and Head-Driven Phrase Structure Grammars with Lexical Rules

In this paper, it is shown that the addition of simple and linguistically motivated forms of lexical rules to grammatical theories based on subcategorization lists, such as categorial grammars (CG) or head-driven phrase structure grammars (HPSG), results in a system that can generate all and only the recursively enumerable languages. The proof of this result is carried out by means of a reducti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 205  شماره 

صفحات  -

تاریخ انتشار 1998